# Wav2Vec2 Architecture

Indicwav2vec Odia
Apache-2.0
Hindi automatic speech recognition (ASR) model based on Wav2Vec2 architecture, developed by AI4Bharat
Speech Recognition Transformers Other
I
ai4bharat
401
2
Audio Classification Model
Apache-2.0
An audio classification model fine-tuned based on facebook/wav2vec2-base-960h, with specific uses and training data not clearly specified.
Audio Classification Transformers
A
SinghManish
19
1
Mms Lid 512
This is a fine-tuned model for speech language identification (LID) across 512 languages, based on the Wav2Vec2 architecture, capable of recognizing the language category of input audio.
Speech Recognition Transformers Supports Multiple Languages
M
facebook
32
2
Mms Lid 256
This is a speech language identification model based on the Wav2Vec2 architecture, capable of recognizing 256 languages, and is part of Facebook's Massively Multilingual Speech (MMS) project.
Audio Classification Transformers Supports Multiple Languages
M
facebook
48.38k
10
Mms Lid 126
A language identification model fine-tuned from Facebook's Massively Multilingual Speech project, supporting audio classification for 126 languages
Audio Classification Transformers Supports Multiple Languages
M
facebook
2.1M
26
Chinese Hubert Base
MIT
A Chinese speech model pretrained on 10,000 hours of WenetSpeech L subset, suitable for speech-related tasks
Speech Recognition Transformers
C
TencentGameMate
1,312
39
Wav2vec2 17
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base, supporting automatic speech-to-text tasks.
Speech Recognition Transformers
W
chrisvinsen
17
0
Wav2vec2 10
Apache-2.0
A speech recognition model fine-tuned from facebook/wav2vec2-base, achieving a Word Error Rate (WER) of 1.0 on the evaluation set
Speech Recognition Transformers
W
chrisvinsen
20
0
Wav2vec2 Xlsr 53 Russian Emotion Recognition
MIT
This is a Russian speech emotion recognition model based on the XLS-R Wav2Vec2 architecture, capable of identifying 7 basic emotions with an accuracy of 72%.
Audio Classification Transformers Other
W
Aniemore
1,106
13
Wav2vec2 3
Apache-2.0
A fine-tuned speech recognition model based on facebook/wav2vec2-base with a Word Error Rate (WER) of 1.0
Speech Recognition Transformers
W
chrisvinsen
16
0
D L Dl
This model is a speech recognition model fine-tuned based on facebook/wav2vec2-base-960h, achieving a word error rate (WER) of 1.0 on the evaluation set.
Speech Recognition Transformers
D
bkh6722
25
0
English Filipino Wav2vec2 L Xls R Test 07
Apache-2.0
This model is a fine-tuned version of jonatasgrosman/wav2vec2-large-xlsr-53-english on Filipino speech datasets, primarily used for English-to-Filipino speech recognition tasks.
Speech Recognition Transformers
E
Khalsuu
24
0
Wav2vec2 Base Timit Demo Colab3
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on the facebook/wav2vec2-base model
Speech Recognition Transformers
W
sherry7144
24
0
Wav2vec2 Base Timit Demo Colab1
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, trained and evaluated on the TIMIT dataset.
Speech Recognition Transformers
W
cuzeverynameistaken
16
0
Wav2vec2 Base Timit Demo Colab60
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, trained for 60 epochs on the TIMIT dataset with a word error rate (WER) of 1.0.
Speech Recognition Transformers
W
hassnain
16
0
Wav2vec2 Base Timit Demo Colab7
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, trained on the TIMIT dataset with a Word Error Rate (WER) of 0.5426.
Speech Recognition Transformers
W
sameearif88
16
0
Wav2vec2 Base Timit Demo Colab3
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, trained on the TIMIT dataset with an evaluated word error rate of 0.5608.
Speech Recognition Transformers
W
sameearif88
16
0
Wav2vec2 Base Timit Demo Colab2
Apache-2.0
This model is a speech recognition model fine-tuned from facebook/wav2vec2-base, achieving a word error rate (WER) of 0.5664 on the evaluation set.
Speech Recognition Transformers
W
sameearif88
16
0
Wav2vec2 Base Timit Demo Colab6
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, trained on the TIMIT dataset with a word error rate (WER) of 0.5282.
Speech Recognition Transformers
W
hassnain
19
0
Wav2vec2 Base Timit Moaiz Explast
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base on the TIMIT dataset, primarily used for English speech-to-text tasks.
Speech Recognition Transformers
W
moaiz237
19
0
Wav2vec2 Base Timit Demo Colab1
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, trained on the TIMIT dataset with a Word Error Rate (WER) of 1.0.
Speech Recognition Transformers
W
tahazakir
24
0
Ctrlv Wav2vec2 Tokenizer
Apache-2.0
A speech recognition model fine-tuned based on facebook/wav2vec2-base, achieving a 31.38% word error rate on the evaluation set
Speech Recognition Transformers
C
proseph
25
0
Wav2vec2 Base Toy Train Data Slow 10pct
Apache-2.0
A speech recognition model fine-tuned on an unknown dataset based on facebook/wav2vec2-base, with a Word Error Rate (WER) of 0.7175
Speech Recognition Transformers
W
scasutt
22
0
Wav2vec Tr Lite AG
Apache-2.0
This is a Turkish automatic speech recognition model based on the XLSR Wav2Vec2 architecture, trained on the Common Voice Turkish dataset.
Speech Recognition Other
W
adresgezgini
27
0
Wav2vec2 Timit Demo
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on the facebook/wav2vec2-base model
Speech Recognition Transformers
W
asini
21
0
Wav2vec2 From Scratch Finetune Dummy
Apache-2.0
This is an Indonesian automatic speech recognition model based on the XLSR Wav2Vec2 architecture, developed by cahya and fine-tuned on the Common Voice Indonesian dataset.
Speech Recognition Transformers Other
W
inergi
15
0
Viwav2vec2 Base 100h
Apache-2.0
A base Wav2Vec2 model pretrained on 100 hours of unlabeled Vietnamese speech audio from the VLSP dataset, requiring fine-tuning for downstream tasks.
Speech Recognition Transformers Other
V
dragonSwing
19
0
Hindi Wav2vec2 Stt
A Hindi speech recognition model based on the Wav2Vec2 architecture that directly transcribes audio into text.
Speech Recognition Transformers
H
addy88
207
1
Wav2vec2 Xlsr Greek Speech Emotion Recognition
Apache-2.0
A Greek speech emotion recognition model based on the Wav2Vec 2.0 architecture, capable of identifying five emotions: anger, disgust, fear, happiness, and sadness.
Audio Classification Other
W
m3hrdadfi
213
9
Wav2vec2 Xls R 300m English
Apache-2.0
XLS-R-300M is an English automatic speech recognition model fine-tuned on the librispeech_asr dataset based on facebook/wav2vec2-xls-r-300m, achieving a word error rate of 12.29% on the LibriSpeech test set.
Speech Recognition Transformers English
W
vitouphy
21
3
Wav2vec2 Base Timit Demo Colab 1
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base, trained on the TIMIT dataset with an evaluation set word error rate (WER) of 0.3874.
Speech Recognition Transformers
W
Prasadi
15
0
Wav2vec2 Base Timit Demo Colab
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on the facebook/wav2vec2-base model
Speech Recognition Transformers
W
202015004
29
0
Test
Apache-2.0
This model is a fine-tuned speech recognition model based on facebook/wav2vec2-base-960h, achieving a word error rate of 21.61% on the evaluation set.
Speech Recognition Transformers
T
GleamEyeBeast
21
0
Timit 5percent Supervised
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on facebook/wav2vec2-large-lv60, using 5% of the data for supervised training
Speech Recognition Transformers
T
Kuray107
31
0
Wav2vec2 Base Timit Demo Colab
Apache-2.0
A speech recognition model fine-tuned on the TIMIT dataset based on the facebook/wav2vec2-base model, specializing in English speech-to-text tasks.
Speech Recognition Transformers
W
Waynehillsdev
28
0
Xls R 1b Ur
Apache-2.0
An Urdu automatic speech recognition (ASR) model fine-tuned from Facebook's wav2vec2-xls-r-1b model, trained on the Common Voice 8.0 Urdu dataset
Speech Recognition Transformers Other
X
HarrisDePerceptron
21
0
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase